Abstract: Due to the increased digitalization of information, a huge amount of data is being generated. Information richness in such data has attracted researchers to this data. The major problem existing in real time data is that it is usually huge and is imbalanced. This paper deals with analysing the tree based real time intrusion detection technique for intrusion detection from highly imbalanced Big Data. Classifiers tend to exhibit lower accuracies and reliabilities when the imbalance levels in the data are increased. Hence a highly imbalanced data is applied on the proposed classifier to determine its efficiency. Sampling techniques are some of the mostly used techniques to reduce the impact of imbalance on classifiers. Hence sampling techniques were applied on the data and the threshold limits for imbalance that can be effectively handled by the proposed classifier is identified.
Keywords: Classifier; Tree based Intrusion Detection; Sampling; Oversampling; Under Sampling; Imbalance; Big Data.